entre gradientes
Solving a large scale constrained non linear program for operations research with gradient descent¶
- Task surveying
- Problem Formulation
- Introductory example
- Scipy
- CVXPY
- Solution with gradient descent
- Alternative geometric approach
- Extra comments on deployment:
- Docker
- Cloud Services
- Flask API with GPU support
- MSSQL Server
Project surveing¶
When solving data science tasks for education purposes or in online challenges (ej. Kaggle), the problem objective is given explicitly, and many times even with adequate metrics. In the industry that is not always the case, as it is frequent that clients do not bring forward a directly tractable formulation of the problem or a Key Performance Indicator. The optimization objective may not be trivial, so it is on the data scientist to define a formal statement. The task addressed is to distribute the stock for the next period, subject to different sets of constraints and avoiding large variations in a given vendor. Taking this inventory allocation case as an example, we would not expect the user to know that the adequate L-norm for his allocation strategy was L-$\infty$. More even, the hierarchy of groupings given by the user may have been coded as a high dimensional tensor, a tree, hypergraph, or flattened vector. Let alone the strategy for the solution.
A couple of initial meetings were required to design the interface, delimit the convenient functionalities, and planning the deployment infrastructure. As I managed the development completely, I had to take into account aspects like client relationship, cost estimations, validation, and deployment. Coming from a more research-oriented career, there are points to keep an eye on as for example, be very wary about promising dates and aim to overestimate 😅. Along the way, we met with the team of analysts to audit the solution and modify the scope and specs.
Problem statement¶
In the abstract, optimization in this context is the process of choosing the best possible vector $\in R^n$ from a set. In this way, it encompasses many ways of decision making, and so the reason for its ubiquitous relevance. The general formulation is:
$\begin{aligned} & \underset{x}{\text{minimize}} & & f(x) \\ & \text{subject to} & & f_i(x) \leq b_i, \; i = 1, \ldots, m. \end{aligned}$
In our case, the problem consists of allocating certain inventory to their possible outlets minimizing the variation with respect to the previous period, and while satisfying arbitrary constraints defined on hierarchical and overlapping categories. A stock of a $p \in \mathbb{N^i}$ products has to be allocated to $s \in \mathbb{N^j}$ outlets. Equality and inequality restrictions intend to modulate sales in different areas, allow compliance to commercial agreements, and respond to diverse business needs. Both $p$ and $s$ are classified in multiple hierarchical levels, e.g. : channel, area, group, delivery route... for $s$; family, product, flavor... for $p$. One possibility for the formulation is to organize this levels into a but forcing symetry would conveys extra memory consumption assigning redundant space (allocating a quantity of each product for each outlet when not all products are sold at all outlets). Therefore a vector is constructed as $\textbf{q} = (s_0p_0, ..., s_ip_j)$, $\textbf{q} \in \mathbb{R}^{i \times j}$ where $p_is_j$ corresponds to the quantity of product $i$ in outlet $j$, and excluding the elements $i,j$ that are not applicable for the period distribution. Continuous relaxation of the $q$ vector is allowed for faster computation. The optimization problem is formulated as:
If all $f_i(x)$ fulfil linearity conditions, $f_i(αx + βy) = αf_i(x) + βf_i(y)$, then the problem corresponds to a linear program. A more general class of problems consist of all that comply $f_i(αx + βy) ≤ αf_i(x) + βf_i(y)$ given $α, β \in [0,1], α + β = 1$, corresponding to the domain of convex problems. Finally, with $x \in \mathbb{Z}$, a Mixed Integer Convex Non-Linear Program can be defined.
$\begin{aligned} \underset{q}{\text{minimize }} & | ( \mathbf{q} - \mathbf{q_{0}} ) \oslash \mathbf{q_{0}} |^{\infty} \\ \text{subject to: } & \textbf{R}\textbf{q} = \textbf{b} \\ & \textbf{M}\textbf{q} \leq \textbf{d} \end{aligned}$
Where each row $r$ in $R \in \{0,1\}^{i \times j}$ represents a restriction and is defined as $r_i = 1$ if the corresponding element in $\textbf{q}$ is in the subset to which the restriction applies. The symbol $\oslash$ corresponds to element-wise (or Hadamard) division.
Modern...ish Vocational Testing
Holla(nd) mundo!¶
The Holland Occupational Themes is a theory of personality that focuses on career and vocational choice. It characterizes people on the basis of their affinity for six different categories of occupations. The six types yield the RIASEC acronym, by which the theory is also commonly known. The typology has come to dominate the field of career counseling and has been incorporated into most of the popular assessments used in the field.
We go through the following steps:
- Loading of example data
- Minimal exploration with pandas
- Result dashboard with plotly
You can take the test online in: https://openpsychometrics.org/tests/RIASEC/
HTML(holla)
sKPIng
I'd like to refresh the comercial aspect of the data world for an interview. For this I gathered a list of Key Performance Indicators to use in practice problems, scrapping from a few websites using HTTP requests, Selenium, regular expressions and XPath.
kpis = {}
for category in categories:
name = category.split('/')[-1]
print(name)
chrome.get(category)
try:
extra = chrome.find_element_by_xpath('//div[@id="featured-snippet"]/ul').text.split('\n')
except:
extra = []
kpis[name] = list(set([e.text for e in chrome.find_elements_by_xpath('//h3[@class="strong h4"]')] + extra))
sleep(3)
Spectral Playground
sampleo = 48000
rec = grabar(duracion=15,
sampleo=sampleo)
display(
Audio(
stereo2mono(
rec[sampleo:]), rate=sampleo
)
)
plot_multiple(stereo2mono(rec[sampleo:]), fs=sampleo)

Con un poco de atención, podemos observar, del gráfico superior al inferior:
- El vibrato de la flauta
- El tono de la obra (C# ~554hz, Prélude à l'après-midi d'un faune)
- La melodía
😃🤟